Overview

Dataset statistics

Number of variables9
Number of observations1030
Missing cells0
Missing cells (%)0.0%
Duplicate rows11
Duplicate rows (%)1.1%
Total size in memory72.5 KiB
Average record size in memory72.1 B

Variable types

Numeric9

Warnings

Dataset has 11 (1.1%) duplicate rowsDuplicates
water is highly correlated with superplasticizerHigh correlation
superplasticizer is highly correlated with waterHigh correlation
water is highly correlated with superplasticizerHigh correlation
superplasticizer is highly correlated with waterHigh correlation
age is highly correlated with csMPaHigh correlation
csMPa is highly correlated with ageHigh correlation
water is highly correlated with superplasticizerHigh correlation
superplasticizer is highly correlated with waterHigh correlation
cement is highly correlated with fineaggregate and 6 other fieldsHigh correlation
fineaggregate is highly correlated with cement and 5 other fieldsHigh correlation
coarseaggregate is highly correlated with cement and 5 other fieldsHigh correlation
flyash is highly correlated with cement and 5 other fieldsHigh correlation
water is highly correlated with cement and 5 other fieldsHigh correlation
superplasticizer is highly correlated with cement and 5 other fieldsHigh correlation
csMPa is highly correlated with cementHigh correlation
slag is highly correlated with cement and 5 other fieldsHigh correlation
slag has 471 (45.7%) zeros Zeros
flyash has 566 (55.0%) zeros Zeros
superplasticizer has 379 (36.8%) zeros Zeros

Reproduction

Analysis started2021-10-30 10:36:55.368680
Analysis finished2021-10-30 10:37:40.947800
Duration45.58 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

cement
Real number (ℝ≥0)

HIGH CORRELATION

Distinct278
Distinct (%)27.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean281.1678641
Minimum102
Maximum540
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2021-10-30T16:07:41.351290image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum102
5-th percentile143.745
Q1192.375
median272.9
Q3350
95-th percentile480
Maximum540
Range438
Interquartile range (IQR)157.625

Descriptive statistics

Standard deviation104.5063645
Coefficient of variation (CV)0.3716867318
Kurtosis-0.5206522845
Mean281.1678641
Median Absolute Deviation (MAD)79.4
Skewness0.5094811789
Sum289602.9
Variance10921.58022
MonotonicityNot monotonic
2021-10-30T16:07:41.632535image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
362.620
 
1.9%
42520
 
1.9%
251.415
 
1.5%
31014
 
1.4%
44614
 
1.4%
33113
 
1.3%
47513
 
1.3%
25013
 
1.3%
34912
 
1.2%
38712
 
1.2%
Other values (268)884
85.8%
ValueCountFrequency (%)
1024
0.4%
108.34
0.4%
1164
0.4%
122.64
0.4%
1322
 
0.2%
1335
0.5%
133.11
 
0.1%
134.71
 
0.1%
1352
 
0.2%
135.72
 
0.2%
ValueCountFrequency (%)
5409
0.9%
531.35
0.5%
5281
 
0.1%
5257
0.7%
5222
 
0.2%
5202
 
0.2%
5162
 
0.2%
5051
 
0.1%
500.11
 
0.1%
50010
1.0%

slag
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct185
Distinct (%)18.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73.89582524
Minimum0
Maximum359.4
Zeros471
Zeros (%)45.7%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2021-10-30T16:07:41.913785image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median22
Q3142.95
95-th percentile236
Maximum359.4
Range359.4
Interquartile range (IQR)142.95

Descriptive statistics

Standard deviation86.27934175
Coefficient of variation (CV)1.167580732
Kurtosis-0.5081754789
Mean73.89582524
Median Absolute Deviation (MAD)22
Skewness0.8007168956
Sum76112.7
Variance7444.124812
MonotonicityNot monotonic
2021-10-30T16:07:42.195040image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0471
45.7%
18930
 
2.9%
106.320
 
1.9%
2414
 
1.4%
2012
 
1.2%
14511
 
1.1%
98.110
 
1.0%
1910
 
1.0%
268
 
0.8%
228
 
0.8%
Other values (175)436
42.3%
ValueCountFrequency (%)
0471
45.7%
114
 
0.4%
13.65
 
0.5%
155
 
0.5%
17.21
 
0.1%
17.51
 
0.1%
17.61
 
0.1%
1910
 
1.0%
2012
 
1.2%
228
 
0.8%
ValueCountFrequency (%)
359.42
 
0.2%
342.12
 
0.2%
316.12
 
0.2%
305.34
0.4%
290.22
 
0.2%
2884
0.4%
282.84
0.4%
272.82
 
0.2%
262.25
0.5%
2601
 
0.1%

flyash
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct156
Distinct (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.18834951
Minimum0
Maximum200.1
Zeros566
Zeros (%)55.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2021-10-30T16:07:42.491913image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3118.3
95-th percentile167
Maximum200.1
Range200.1
Interquartile range (IQR)118.3

Descriptive statistics

Standard deviation63.99700415
Coefficient of variation (CV)1.181010397
Kurtosis-1.328746435
Mean54.18834951
Median Absolute Deviation (MAD)0
Skewness0.5373539058
Sum55814
Variance4095.616541
MonotonicityNot monotonic
2021-10-30T16:07:42.773160image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0566
55.0%
118.320
 
1.9%
14116
 
1.6%
24.515
 
1.5%
7914
 
1.4%
9413
 
1.3%
100.411
 
1.1%
125.210
 
1.0%
95.710
 
1.0%
98.810
 
1.0%
Other values (146)345
33.5%
ValueCountFrequency (%)
0566
55.0%
24.515
 
1.5%
591
 
0.1%
601
 
0.1%
711
 
0.1%
71.51
 
0.1%
75.61
 
0.1%
761
 
0.1%
772
 
0.2%
782
 
0.2%
ValueCountFrequency (%)
200.11
 
0.1%
2001
 
0.1%
1953
0.3%
194.91
 
0.1%
1941
 
0.1%
1931
 
0.1%
1901
 
0.1%
1871
 
0.1%
185.31
 
0.1%
1852
0.2%

water
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct195
Distinct (%)18.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean181.5672816
Minimum121.8
Maximum247
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2021-10-30T16:07:43.054416image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum121.8
5-th percentile146.1
Q1164.9
median185
Q3192
95-th percentile228
Maximum247
Range125.2
Interquartile range (IQR)27.1

Descriptive statistics

Standard deviation21.35421857
Coefficient of variation (CV)0.1176104989
Kurtosis0.1220816744
Mean181.5672816
Median Absolute Deviation (MAD)13
Skewness0.07462838429
Sum187014.3
Variance456.0026505
MonotonicityNot monotonic
2021-10-30T16:07:43.475636image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
192118
 
11.5%
22854
 
5.2%
185.746
 
4.5%
203.536
 
3.5%
18628
 
2.7%
164.920
 
1.9%
16220
 
1.9%
18515
 
1.5%
153.515
 
1.5%
20014
 
1.4%
Other values (185)664
64.5%
ValueCountFrequency (%)
121.85
0.5%
126.65
0.5%
1271
 
0.1%
127.31
 
0.1%
137.85
0.5%
1401
 
0.1%
140.85
0.5%
141.85
0.5%
1421
 
0.1%
143.35
0.5%
ValueCountFrequency (%)
2471
 
0.1%
246.91
 
0.1%
2371
 
0.1%
236.71
 
0.1%
22854
5.2%
221.41
 
0.1%
2212
 
0.2%
220.11
 
0.1%
2202
 
0.2%
219.71
 
0.1%

superplasticizer
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct111
Distinct (%)10.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.204660194
Minimum0
Maximum32.2
Zeros379
Zeros (%)36.8%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2021-10-30T16:07:43.788141image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median6.4
Q310.2
95-th percentile16.055
Maximum32.2
Range32.2
Interquartile range (IQR)10.2

Descriptive statistics

Standard deviation5.973841392
Coefficient of variation (CV)0.9627991228
Kurtosis1.411268965
Mean6.204660194
Median Absolute Deviation (MAD)5.3
Skewness0.9072025749
Sum6390.8
Variance35.68678098
MonotonicityNot monotonic
2021-10-30T16:07:44.053755image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0379
36.8%
11.637
 
3.6%
827
 
2.6%
719
 
1.8%
617
 
1.7%
9.916
 
1.6%
8.916
 
1.6%
7.816
 
1.6%
916
 
1.6%
1015
 
1.5%
Other values (101)472
45.8%
ValueCountFrequency (%)
0379
36.8%
1.74
 
0.4%
1.91
 
0.1%
21
 
0.1%
2.21
 
0.1%
2.52
 
0.2%
36
 
0.6%
3.11
 
0.1%
3.43
 
0.3%
3.65
 
0.5%
ValueCountFrequency (%)
32.25
0.5%
28.25
0.5%
23.45
0.5%
22.11
 
0.1%
226
0.6%
20.81
 
0.1%
201
 
0.1%
191
 
0.1%
18.81
 
0.1%
18.65
0.5%

coarseaggregate
Real number (ℝ≥0)

HIGH CORRELATION

Distinct284
Distinct (%)27.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean972.918932
Minimum801
Maximum1145
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2021-10-30T16:07:44.335007image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum801
5-th percentile842
Q1932
median968
Q31029.4
95-th percentile1104
Maximum1145
Range344
Interquartile range (IQR)97.4

Descriptive statistics

Standard deviation77.75395397
Coefficient of variation (CV)0.07991822485
Kurtosis-0.5990161032
Mean972.918932
Median Absolute Deviation (MAD)46.3
Skewness-0.04021974481
Sum1002106.5
Variance6045.677357
MonotonicityNot monotonic
2021-10-30T16:07:44.585010image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
93257
 
5.5%
852.145
 
4.4%
944.730
 
2.9%
96829
 
2.8%
112524
 
2.3%
104719
 
1.8%
96719
 
1.8%
97412
 
1.2%
94212
 
1.2%
93812
 
1.2%
Other values (274)771
74.9%
ValueCountFrequency (%)
8014
0.4%
801.11
 
0.1%
801.41
 
0.1%
8112
0.2%
8141
 
0.1%
814.11
 
0.1%
817.91
 
0.1%
8181
 
0.1%
8192
0.2%
819.21
 
0.1%
ValueCountFrequency (%)
11451
 
0.1%
1134.35
 
0.5%
11301
 
0.1%
112524
2.3%
1124.42
 
0.2%
11202
 
0.2%
11192
 
0.2%
1118.82
 
0.2%
11181
 
0.1%
11132
 
0.2%

fineaggregate
Real number (ℝ≥0)

HIGH CORRELATION

Distinct302
Distinct (%)29.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean773.5804854
Minimum594
Maximum992.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2021-10-30T16:07:45.013624image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum594
5-th percentile613
Q1730.95
median779.5
Q3824
95-th percentile898.09
Maximum992.6
Range398.6
Interquartile range (IQR)93.05

Descriptive statistics

Standard deviation80.17598014
Coefficient of variation (CV)0.1036427129
Kurtosis-0.1021769893
Mean773.5804854
Median Absolute Deviation (MAD)45.5
Skewness-0.2530095977
Sum796787.9
Variance6428.187792
MonotonicityNot monotonic
2021-10-30T16:07:45.294877image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
755.830
 
2.9%
59430
 
2.9%
67023
 
2.2%
61322
 
2.1%
80116
 
1.6%
746.615
 
1.5%
887.115
 
1.5%
71214
 
1.4%
84514
 
1.4%
75012
 
1.2%
Other values (292)839
81.5%
ValueCountFrequency (%)
59430
2.9%
6055
 
0.5%
611.85
 
0.5%
6121
 
0.1%
61322
2.1%
613.22
 
0.2%
6141
 
0.1%
6232
 
0.2%
6305
 
0.5%
6314
 
0.4%
ValueCountFrequency (%)
992.65
0.5%
9454
0.4%
943.14
0.4%
9424
0.4%
925.75
0.5%
905.95
0.5%
903.85
0.5%
903.65
0.5%
901.85
0.5%
900.95
0.5%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct14
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.66213592
Minimum1
Maximum365
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2021-10-30T16:07:45.544877image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q17
median28
Q356
95-th percentile180
Maximum365
Range364
Interquartile range (IQR)49

Descriptive statistics

Standard deviation63.16991158
Coefficient of variation (CV)1.383419989
Kurtosis12.16898898
Mean45.66213592
Median Absolute Deviation (MAD)21
Skewness3.269177401
Sum47032
Variance3990.437729
MonotonicityNot monotonic
2021-10-30T16:07:45.732383image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
28425
41.3%
3134
 
13.0%
7126
 
12.2%
5691
 
8.8%
1462
 
6.0%
9054
 
5.2%
10052
 
5.0%
18026
 
2.5%
9122
 
2.1%
36514
 
1.4%
Other values (4)24
 
2.3%
ValueCountFrequency (%)
12
 
0.2%
3134
 
13.0%
7126
 
12.2%
1462
 
6.0%
28425
41.3%
5691
 
8.8%
9054
 
5.2%
9122
 
2.1%
10052
 
5.0%
1203
 
0.3%
ValueCountFrequency (%)
36514
 
1.4%
3606
 
0.6%
27013
 
1.3%
18026
 
2.5%
1203
 
0.3%
10052
 
5.0%
9122
 
2.1%
9054
 
5.2%
5691
 
8.8%
28425
41.3%

csMPa
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct845
Distinct (%)82.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.81796117
Minimum2.33
Maximum82.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.2 KiB
2021-10-30T16:07:46.044882image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2.33
5-th percentile10.961
Q123.71
median34.445
Q346.135
95-th percentile66.802
Maximum82.6
Range80.27
Interquartile range (IQR)22.425

Descriptive statistics

Standard deviation16.70574196
Coefficient of variation (CV)0.4664068366
Kurtosis-0.3137248604
Mean35.81796117
Median Absolute Deviation (MAD)10.93
Skewness0.4169772884
Sum36892.5
Variance279.0818145
MonotonicityNot monotonic
2021-10-30T16:07:46.410676image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33.46
 
0.6%
77.34
 
0.4%
79.34
 
0.4%
31.354
 
0.4%
71.34
 
0.4%
35.34
 
0.4%
23.524
 
0.4%
41.054
 
0.4%
44.283
 
0.3%
41.543
 
0.3%
Other values (835)990
96.1%
ValueCountFrequency (%)
2.331
0.1%
3.321
0.1%
4.571
0.1%
4.781
0.1%
4.831
0.1%
4.91
0.1%
6.271
0.1%
6.281
0.1%
6.471
0.1%
6.811
0.1%
ValueCountFrequency (%)
82.61
 
0.1%
81.751
 
0.1%
80.21
 
0.1%
79.991
 
0.1%
79.41
 
0.1%
79.34
0.4%
78.81
 
0.1%
77.34
0.4%
76.81
 
0.1%
76.241
 
0.1%

Interactions

2021-10-30T16:07:18.841088image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:19.169194image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:19.387944image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:19.591072image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:19.809821image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:20.309824image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:20.528568image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:20.747323image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:20.950450image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:21.169195image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:21.387950image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:21.622328image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:21.841073image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:22.075448image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:22.294197image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:22.544202image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:22.778571image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:22.997330image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:23.231696image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:23.466072image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:23.700445image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:23.950448image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:24.244970image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:24.486180image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:24.752343image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:25.002344image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:25.234872image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:25.484878image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:25.688002image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:25.938003image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:26.172383image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:26.406751image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:26.656758image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:26.922382image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:27.172374image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:27.406758image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:27.625507image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:27.844255image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:28.078624image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:28.313005image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:28.531760image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:28.750505image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:28.984882image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:29.219859image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:29.438195image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:29.656946image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:29.891320image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:30.125694image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:30.360069image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:30.578827image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:30.813195image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:31.047570image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:31.281946image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:31.516319image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:31.881524image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:32.208517image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:32.547202image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:32.914169image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:33.256161image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:33.555213image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:33.874211image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:34.126633image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:34.405245image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:34.779245image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:35.085253image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:35.381243image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:35.623316image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:35.989316image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:36.314370image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:36.592694image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:36.827084image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:37.146268image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:37.411898image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:37.630650image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:37.896268image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:38.161895image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:38.411896image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:38.646273image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:38.896274image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:39.130649image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T16:07:39.771271image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-10-30T16:07:46.848182image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-30T16:07:47.270054image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-30T16:07:47.566932image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-30T16:07:47.879429image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-30T16:07:40.307140image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-30T16:07:40.713398image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

cementslagflyashwatersuperplasticizercoarseaggregatefineaggregateagecsMPa
0540.00.00.0162.02.51040.0676.02879.99
1540.00.00.0162.02.51055.0676.02861.89
2332.5142.50.0228.00.0932.0594.027040.27
3332.5142.50.0228.00.0932.0594.036541.05
4198.6132.40.0192.00.0978.4825.536044.30
5266.0114.00.0228.00.0932.0670.09047.03
6380.095.00.0228.00.0932.0594.036543.70
7380.095.00.0228.00.0932.0594.02836.45
8266.0114.00.0228.00.0932.0670.02845.85
9475.00.00.0228.00.0932.0594.02839.29

Last rows

cementslagflyashwatersuperplasticizercoarseaggregatefineaggregateagecsMPa
1020288.4121.00.0177.47.0907.9829.52842.14
1021298.20.0107.0209.711.1879.6744.22831.88
1022264.5111.086.5195.55.9832.6790.42841.54
1023159.8250.00.0168.412.21049.3688.22839.46
1024166.0259.70.0183.212.7858.8826.82837.92
1025276.4116.090.3179.68.9870.1768.32844.28
1026322.20.0115.6196.010.4817.9813.42831.18
1027148.5139.4108.6192.76.1892.4780.02823.70
1028159.1186.70.0175.611.3989.6788.92832.77
1029260.9100.578.3200.68.6864.5761.52832.40

Duplicate rows

Most frequently occurring

cementslagflyashwatersuperplasticizercoarseaggregatefineaggregateagecsMPa# duplicates
1362.6189.00.0164.911.6944.7755.8335.304
3362.6189.00.0164.911.6944.7755.82871.304
4362.6189.00.0164.911.6944.7755.85677.304
5362.6189.00.0164.911.6944.7755.89179.304
2362.6189.00.0164.911.6944.7755.8755.903
6425.0106.30.0153.516.5852.1887.1333.403
7425.0106.30.0153.516.5852.1887.1749.203
8425.0106.30.0153.516.5852.1887.12860.293
9425.0106.30.0153.516.5852.1887.15664.303
10425.0106.30.0153.516.5852.1887.19165.203